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This paper is cited in the following contexts: 

First 50 documents Next 50 

Fffprtivfi Methods for imnmvina Naive R^y^s T^xt Classifiers - Kim. Rim, Yook, Lim (Corieca 

nearest neighbor classi ers[7] naive Bayes classi ers[5] and support vector machines[3] etc. Among 
fliese methods, naive Bayes text classi ers have been widely used because of its smiphci y ^Aough 
they have been reported as one of poor performing class ers in text categorization tasklS, 2]. Smce 
several studies show that naive Bayes performs surprisingly well in many other domams[l] it is worth of 
clarifying the reason that naive Bayes fails in the text classi cation tasks and raiproving them This paper 
describes the problems in traditional naive Bayes text classi cation .... 

c i) i=l TF ik P (y i 6= c j jd i ) i=l TF is P (y i 6= c j jd i ) 4) The laplacean prior is used to avoid 
probabilities of zero or one. For our experiment, parameter is set to 1. This estunation technique has 
been generally used to implement naive Bayes classi ers in most studiesfS, 2, 5j. There are, however, 
some issues in estimating parameters and calculating scores. The parameter estimation accordmg to 
formula (3) regards all of documents belong to c j as one huge document. In other words, this estimation 
method does not take into account the fact that there may be important .... 
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Tlsin g rough sets tn construct sense tvpe deci s ion trees.. - Computing And.. iCoHsd) 

Text categorization, the assignment of natural language texts to one or more predefined categories 
based on their content, is an important component in information organization and management t^ks. 
There is an increased interest in developing technologies for automatic text categorization !3j. 
There are two different ways of approaching the problem category extraction and category assignment 
[8] In this paper, we focus on the category extraction problem In a previous work [2, 1] following the 
treatment of strongly typed functional programming languages, we have shown that .... 

T)wmfi. J. Plait I. Bs^-nm. and M. Sahs^ri i>?di^cm.'g feam/ng- aigonihnvi and rq;res<inlafk>ns far 
l-xf caiegonzoiiofi m (;IKM'98 ■• :Proc. 7?b Ml Coiu; on fcfcrmatioB mi S^o^iedge Managemerit, 



1998. 



jmi/APL at TRRC 2002: Experiments in and Arabic. - Paul Mcnamee Christine (CjOTeci). 

misspelled words, broken plurals, and infix morphology, and empirically evaluated techniques to 
overcome them Larkey et al. 8] investigated methods for effectively stemming Arabic. aplllFahl 0.342 
0 104 0 377 0 039 a lllFah2 0.342 0. 104 0.377 0.039 apU IFaql 0.059 0.09 0.084 0. 369 a lllFaq2 0.085 



:faiuctive leaidng algorito...Platt, Heckem^iami (ResearchMex) wyi|://53/http://c.teseer.q.n.<>.com'conl6xt/1850 

0 1 18 0 1 15 0 355 Table 5. APL Adaptive Results, Assessor topics Clearly, the heap approach returned 
too few documents, whereas the queue approach returned too many. This is probably mainly due to the 
much lower amount of feedback, it was probably also adversely affected by our choice of.... 

parameter that guessed too often. l^H Results Discussion in a low training feedback situation, 
seems to require more of a Statistical Language Model score based approach. Based on the 
good performance possible in situations with lots of training and feedback (as in TREC 2001) 
there seems to be a continuum between score based and classification approaches, depending on 
the amount of training and feedback available. We conjecture a hybrid approach will be useful to 
support this continuum. Given the successful reports of n gram based retrieval for Arabic, we opted .... 
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PreBIND and Textomv - m inin g the biomedical. . - Don aldson. . (2003) (C^oiTect) 

(Fig 2, item 4) Textomy (Fig. 2, item 7) http; www.litminer.ca retrieves these abstracts from PubMed 
and assigns a score that describes the relative likelihood that the abstract contains molecular interaction 
information. Textomy, or text anatomy , is text processing software that uses an SVM [19 21] to 
capture the statistical pattern of word use in papers that have previously been presented to the 
machine as papers of interest , in this case, a training set of abstracts that discuss biomolecular 
interactions. These SVM scores are stored in the PreBIND database (Fig. 2, item 4) Textomy is .... 

Dan^ais S. ¥im I. Beckerman D ma Sahan?! M hmciive learning dgorithms and i-qjresentmom Jbr 
lexf cate<^ortadon. Procee&f^s of die M&naumd Coaference or My^mmm and :Kno^s;l<;dge , 
M:ai^agm«:av. 19%, 148-155 

Com pai-ison of Machine Learning and Traditional.. - Chan, Lee.. (2002) (C«>irecl\ 

....penalty term regulating the generalization performance of the SVM. Upon training, only a fraction of 
the s will be nonzero. The architecture of the SVM in classification is shown in Fig. 1 . SVMs have ^ 
demonstrated good generalization performance in face recognition [26] text categorization [2?!, 
and optical character recognition [28] 29] It has also been appUed to data from gene expression 
[30] DNA and protein analysis [31] 32] D. MOGs As^ m^^^^ in Section I, the generative approach 
is to model the class conditional density . Since the UpM of the glaucoma data contains .... 

S. T, Dumb. I Plavi, D, Hsd-cmmi, and M. Sabairi ''[nducfm: iearnmg (dgorfikm and 
rcoresemanons for text caiegorkdion; in Proc. ACM-Conf latbrnmiion and Kt-owledge ManagenKns; 
(dKM98),Nov 1998, pp. 148.-I55, 



Integrating Feature and Instance Selection - For Text Classification ((Mgct). 

. survey on feature and instance selection as two independent problems In the of machine 

learning is presented in [3] In the ilMf of information retrieval and text classification, several 
works have indicated that effective feature selection can enhance the performance of classifiers. In 
[5), m and [17) a few tens or hundreds of words maximize the performance of a range of 
classifiers. Siinilar results are reported in [9] 13] as well. SVMs are a notable exception to this since they 
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N^cuments match Boolean query. Trying non-Boolean relevance query. optrievina 
1000 documents found. Only retrieving 500 documents (System busy ■ maximum reduced). Retrieving 
documents... Order: relevance to query. 

WPhW;,tr.her: Machin p I p^rnino and HvDPrtPxt - Thorsten Joachims (1995) 
noted the need for software that helps the user search for information, i nis paper descnbes the des gn 
fa^ng is simS^r to the problem of Collaborative Filtering [Resnick 19941The target func ion we wa"^^^^^^^^ 
is knowledge in the nodes of the graph encoded as text. We have begun to explore ways of using this text 

mobite.csie.ntu.edu.tw/-y]hsu/courses/u1760/papers/webwatcher.ps.gz 

An nn-Line Cursivf^ Wnrd Recognitio n ■'Systsm - Seni. Nasrahadi. Srihari (1994) (Sorrgst). 
word Such string is then passed to a procedure search(fO which has knowledge about how to d^^^^^^ ASCII 
cursive words. The system first uses a filtering module, based on simp e etter ea ures, o quickly 
cursive words. The system first uses a filtering module, based on simple letter features, to 

www.cedaf.buffalo.edu/Lingu!Stics/papers.'ieee.ps 

Rp^Mlts and Challen q ^'^ in Web Search Fv^i-^tlnn - Hawkinn Craswell, Thistlewaite (1999) (Cgrrecfi 
Ml citations) ^ ^ , . 

Results and Challenges in Web Search Evaluation David Hawking o o a 

from pages. It is not known whether any such filtering was applied by the '"t«74Archive^2.2 Acces 
is being used in an evaluation framework within the Text Retrieval Conference (TREC) and will hopefully 

pastime.anu.edu,au/TAR/wv.'w8.ps.gz 

M„.,.n Performance on ri,..t.rinn w.h Paces: A - Macskassy, Banerjee (1998) 

using multiple queries or using a topic-specific search engine. One way to help in the sm^m^^^^ 

fewer clusters than those with access to the full text of each web page. Generally the overlap of 

?^k'^gS998^T^ Performance on Clustering Web Pages: A Preliminary Study Sofus A. Macskassy. 

www,cs.rui:gers.edu/~davison.'pubs/kdd98.ps 

p... p.. .mmpnHinn I Isinn Text Cateqnrl7atlnn with Extracted.. - Moone y (1998) (CoEesfi (Icita^ 
and sSg a DaSse Fi^t, an Amazon subjec t search is performed to obtain a list o book-descnption 
of com ulerized matchmaking called collaborative fiKering^ The system — ^.^.^ 
in the AAA1-98/ICML-98 Workshop on Learning for Text Categonzation and the AAAI-98 Workshop on 

ftp.cs.utexas.edu/pub/mooney/papers/libr3-textcat98.ps.gz 

Firct-nrripr I Parninn for Web Mininn - Craven (1998) jCorrectl (7 citations) 
oMg a conceStdefinftb rules for navigating the Web. In general, 

A variety of applicattons, including information filtering systems and browing fs-stants, have used 
conte^Cohen [1] has used first-order methods for text classification, but the focus was on finding 
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to™eThe Internet on behalf of a user"rt se arches the World Wide Web taking bounded amount of 
user's shouider-4 1 Stracture of the learning module Leamer works in two versions: learning a new 
Jang [21] S^^^^^^^ a system for electronic news filtering that uses text-learn ng to generate models of 
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Using Multicast for Blind Distributed Web Searching and Advertising Julio C. Navas #and Haym 

Even when users decide to receive advertisements, filters can be used to weed-out unwanted advertisements 

o#er can often be just the first few lines of text from the document. Unless the author 

w^vvv,cs.rutgers.edu/pub.technical-reports/dcs-tr-377.ps.Z 
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Search and Ranking Algorithms for Locating Resources on 
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Algorithms for Locating Resources on the World Wide Web Budi Yuwono Dik L. Lee Department of Computer and 
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Toolkits for a Distributed. Aoent-Based Web Commerce System - Guanahao Yan (Correct) 

it more time consuming and difficult for people to search for information or to locate relevant web sites 

information selling strategies Communication module Transaction processor User interface Products 

3-5, 1998. Toolkits for a Distributed, Agent-Based Web Commerce System Guanghao Yan Wee-Keong Ng . 
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The MetaCrawler Architecture for Resource Aggregation on the Web - Selberg, Etzioni (1997) (Correct) 
(55 citations) 

The MetaCrawler Softbot is a parallel Web search service that has been available at the University 
The Harness is implemented as a collection of modules, where each module represents a particular 
users desire, such as phrase searching or filtering by location, are often absent or require a 

vvwxs>washington.edu/homes/speed/papers/ieee/ieee-metacrawlerps 

Building a Digital Librarvfor Computer Science Research.. - Ian Witten (1996) (Correct) (1 citation) 

report archives, and supports a variety of search types despite the fact that documents are not 

in several respects. First, it provides a full-text index of the entire contents of each document, 

a large number of documents, many of which are web pages rather than technical reports. The documents 
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Optimizing complex decision support Queries for parallel.. - Brunie. Kosch (1997) (Correct) 

allocation module coupled with a randomized search module to seek for the best parallelization 

It integrates an intelligent resource allocation module coupled with a randomized search module to seek 

optimization process. We implemented a first-last page cost model, including communication costs. Latency 
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Learning to Extract Symbolic Knowledge from the World.. ■ Craven. DiPasQuo.. (1998) (riiirrect) (66 citations) 
input URL and explores pages using a breadth-first search to follow links. Each explored page is examined, 
Improving learning accuracy in information filtering. In International Conference on Machine 
to Recognize Class Instances 1 1 5.1. Statistical Text Classification : 
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Privacy Interfaces for Information Management - Lau, Etzioni. Weld (1999) (Correct) (7 citations) 
her browsing history automatically. A user can search her CLIO for pages which she has previously 
has visited that contain the phrase collaborative filterlng.To discover pet owners, one might search 
a match against the document's URL instead of its textual content. There is an implicit conjunction over 

ftp,cs.wash!ngton.edu/tr/1998/02/UW-CSE-98-02-01 PS.Z 

Interactive Modular Programming in Scheme - Tung (1992) {CpjiecQ (2 citations) 
Abstract This paper presents a module system and a programming environment designed to 
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Adverbs in the transfer module of MPS - Damova (1995) (Correct) 
Adverbs in the transfer module of MDS Mariana Damova Universitat Stuttgart 
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Spacetime Constraints Revisited - Ngo. Marks (1993) (Correct) (58 citations) 
refine an initial trajectory. We propose a global search algorithm that is capable of generating multiple 
be described in summary as follows: ffl A dynamics module (x2.1) simulates a physically correct virtual 
Proceedings, Anaheim, California, August 1993, pages 343-350. c fl1993 ACM. Reproduced by permission 
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Machine Learning for Information Extraction in Informal Domains - Freitag (1998) (Correct) (20 citations) 
newsgroups where computers are offered for sale in search of one that matches a user's specifications. This 
which are implemented in Perl, make use of a Perl module called Token that defines Perl versions of most 
be present in a collection, so that some sort of filtering must be performed either before or during 
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Global Integration of Visual Databases - Wendy Chang (1998) (Correct) (1 citation) 
main components include the metadatabase, the search agent, and the query manager. The metadatabase 
the metadata and the templates. Three additional modules, metadata collector, template builder and 
systems are being developed that allow multiple text databases to be accessed over the Internet via 
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allocation module coupled with a randomized search module to seek for the best parailelization 

It integrates an intelligent resource allocation module coupled with a randomized search module to seek 

finding an optimal execution scenario [3]ln this context we use randomized search algorithms to present a 
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nature of the learning method which uses weak search methods directed by the category utility 

a learning program for noisy domains that uses one module to extract object descriptions from data and 
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current information retrieval systems use boolean search methods to request and retrieve documents. While 
words filtered out) are returned to the ranking module which determines the order in which the documents 
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refine an initial trajectory. We propose a global search algorithm that is capable of generating multiple 

be described in summary as follows: ffl A dynamics module (x2.1) simulates a physically correct virtual 
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a tool application position with 1-DOF requires a search through a 1-dimensional search space. To guide 

Micro Planning for Mechanical Assembly Operations S. 

Significant advances have been made in the area of macro planning for assembly operations (i.e.dividing 
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and natural settings, for this facilitates the search for universal features of evolutionary dynamics. 
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is governed by some explicit dynamics and a macro level consisting of the population as a whole 
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We then illustrate how the scenario recognition module works through an example of utilization. Finally, 

behaviors. This third module uses two kinds of context (defined as a priori information on the scene 
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by Eisenstat [2]Optimization The performance index was the mole flow of product averaged over the 
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abstraction helps the system to avoid an in-depth search of those cases which are entirely different from 
only for those images which contain man-made macro objects such as refinenes, steel plants etc. The 
Some of the features of this system in our context are the following. 5.1 Support for complex data 



